1 - 26.1. Full Bayesian Learning [ID:30388]
50 von 69 angezeigt

Okay. So the last thing I want to do in machine learning is what's called statistical learning,

where we're essentially using Bayesian network techniques for learning. We've been in a sense

using Bayesian networks as models of the world which can be used to make predictions about the

world, probabilistic predictions about the world. The problem there of course is that where do the

networks come from? How do we learn such networks? No theory really is complete without giving

ourselves a way of learning those. The answer of course is we learn the networks. I want to show

you how that works kind of as the logical conclusion of all the probabilistic stuff we've

done, we've been doing in the past. And the idea is relatively simple. The idea of Bayesian learning

is that rather than trying to get a kind of a zero-one having the best hypothesis, why not use

probabilistic methods to learn a probability distribution on the hypothesis space? Give

yourself a little bit of softness there. And the idea is that rather than using these hard methods

of weeding out hypotheses that we've been doing in learning here, what you want to do is you want to

think of the hypothesis as a random variable and the examples as observations or data of another

random variable that are connected somehow and see how we can use our methods that we've developed

before for that. So we have a hypothesis variable and that may or may not have a known prior

distribution. That's what you know when you have seen no examples. Then you have a prior for that

hypothesis variable and then you get the observations DJ in, which is really the outcome of some other

random variable. And we have the training data as observations. Sounds a little bit like what we've

seen before. And so we can use those to get the posterior probability of the one hypothesis HI

given the data so far. And we can use our usual methods and the Bayesian methods to get this

estimations. Normalization constants times the probability of the data given the hypothesis.

We've used Bayes rule here to turn around the condition probabilities times the prior probability

of the hypothesis itself. There's something we know here. So this term here we'll call the likelihood

of the data given the hypothesis. And we can now make predictions of a certain outcome of a property

X, which we do that by summing out over the hypotheses and giving us this. And then we have

an independence here of the data of course. And that gives us essentially this thing here.

The nice thing about this is we do not have to pick a best hypothesis. We can make predictions

without hypothesis choice. And we kind of evolve the hypothesis is kind of hidden in the word work.

We evolve that at the same time as our data comes in. Here's an example. So we have a new kind of

sweets on the market. And since sweets is commodity, people make new inventions about them

that make them more attractive. Here the invention is that we add an element of surprise. You have

the company sells five kinds of bags of these where we have one kind of bag where we have two

kinds of sweets, lime and cherry. And for the sake of argument, we prefer cherry over lime.

So we have these bags. These bags all look the same and their contents also looks the same

because they're wrapped. And once you unwrap them, you can see whether it's lime or cherry.

And we have five kinds of bags. One is all cherry. One is all lime. We have two bags that have

three quarters limes or three quarters cherries. And then we have one 50-50 bag. And there's a

prior here with a prior distribution is this 10, 20, 40, 20, 10. Adds up nicely to 100%.

That's the setup. We're interested in a couple of things. One is if I've unwrapped a couple of lime,

a couple of candies, what's the probability that the next candy is actually cherry,

which we prefer? That's one. Or what's the probability of my bag that I've bought and that

has no indication of what it is? On the outside, what's the probability given that we've unwrapped

a couple of candies? What's the probability being in one of these five classes? Those are the things

that we're interested in. And we're assuming the usual stuff we're assuming, namely that the whole

thing is IID, independently and identically distributed. And we can ensure those kind of

things by making the bags very big. Or we can assure that by unwrapping the candy and rewrapping

it and putting it back into the bag, which we prefer not to do. We'd rather have big bags of

candy, even if that doesn't give you full IID, but near enough. So what do we do? If we actually

take this prediction formula or the hypothesis formula, let's run an experiment. Namely,

what happens if we only find lines? And here's the results. If we have zero candies unwrapped,

Teil eines Kapitels:
Chapter 26. Statistical Learning

Zugänglich über

Offener Zugang

Dauer

00:13:26 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-30 17:27:55

Sprache

en-US

The Candy Flavors Example to introduce Full Bayesian Learning and its properties. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen